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(54) System for compressing and decompressing binary representations of dithered images 



(57) A method tor compression and decompression 
of dithered images is disclosed. Logical units (tiles) of 
the binary representation are classified (205) into equiv- 
alence classes which are then compressed (206). Each 
equivalence class represents tiles having similar gray 
levels (i.e. the same number of black pixels), but which 
may have different sequences of black and white pixels. 



Each equivalence class has associated with it a prede- 
fined set of rendering exemplars. Each of the exemplars 
has a similar gray level. Upon decompression, each in- 
stance ot an equivalence class takes on the value of one 
of the rendering exemplars which is selected pseudo- 
randomly. This effectively causes the image to be re- 
dithered so there is no loss of critical image information 
on decompression. 




BNSDOCID: <EP 090239SA2J_> 



1 



EP 0 902 398 A2 



2 



Description 

[0001] The present invention is related to the field of 
data compression, and In particular to LOSSY conrpres- 
sion of dithered images. 

[0O02] A major stumbling block to common use of dig- 
itized images is their size. An 8.5 x 11 image at 300 dots 
per inch (dpi) contains roughly 8,000,000 pixels. Even 
after binarlzation of a scanned image reduces the 
number of bits per pixel to 1, this is still 1 megabyte. 
Compression techniques are typically characterized as 
LOSSLESS or LOSSY. In LOSSLESS compression, no 
data Is lost In the compression and subsequent decom- 
pression. In LOSSY compression, a certain amount of 
data is lost but it is acceptable since the essence of the 
compressed data is retained after decompression. 
[0003] Common LOSSLESS compression tech- 
niques for binary images, like CCITT Group 3 or Group 
4 or MMR can compress a binary image by factors of 1 0 
to 20. This is still large when compared to the synthetic 
electronic form used to create a comparable image. 
Moreover, such compression techniques do not perform 
well for dithered images. This Is because such compres- 
sion techniques generally depend on the compressor's 
ability to predict the value of a pixel given neighboring 
pixels. Dithered images contain many very tiny dots 
which are Intentionally arranged in a pseudo-random 
pattern. In these images, it is quite difficult to predict the 
value of a pixel, thus such compression techniques per- 
form poorly. 

[0004] Vector Quantization is another LOSSY based 
method for image compression that Is welt known. A 
vector quantizer ( VQ) is a quantizer that maps k-dimen- 
slonal input vectors into one of a finite set of k-dimen- 
sional reproduction vectors, or codewords. For image 
compression the input vector is a fixed grouping of pix- 
els. A VQ can be divided into two parts: an encoder and 
a decoder. The encoder maps the input vector into a bi- 
nary code representing the index of the selected repro- 
duction vector, and the decoder maps the binary code 
into the selected reproduction vector. The reproduction 
vector becomes the decompressed value of the input 
vector. 

[0005] Typically the decoder operates using a simple 
lookup table. To obtain acceptable results upon decom- 
pression, the number of reproduction vectors, and re- 
sulting lookup table, can be quite large. As the lookup 
table may be part of the compressed data stream, a 
large lookup table is undesirable. 
[0006] Vector quantization Is conceptually similar to a 
known method for performing compression on text Im- 
ages by grouping the symbols found Into equivalence 
classes. In this method, symbols are extracted from the 
binary image and matched to templates for one or more 
equivalence classes. In order to get good compression, 
a classifier should operate with a small number of class- 
es. 

[0007] An example of Image compression based on 



symbol matching is described In co-pending U.S. Patent 
Application Serial No. 08/575,305 filed December 20, 
1995, entitled "Classification Of Scanned Symbols Into 
Equivalence GlassesV A further example of image com- 

s pression based on symbol matching is described in U. 
S. Patent No. 5,303,31 3 entitled "Method and Apparatus 
For Compression Of Images", Mark et al., issued April 
12, 1994 (the '313 patent). In the '313 patent an image 
is "precompressed" prior to symbol matching. The *313 

10 patent describes using run-length encoding for such 
precompression. Symbols are extracted from the run- 
length representation. A voting scheme is used in con- 
junction with a plurality of similarity tests to improve sym- 
bol matching accuracy. The '31 3 patent further discloses 

IS a template composition scheme wherein the template 
may be modified based on symbol matches. 
[0008] However, the aforementioned symtXDl based 
compression techniques do not compress particularly 
well with respect to pictorial images, In particular dith- 

20 ered images. This is because the pseudo random pat- 
terns typically causes a high number of equivalence 
classes to be created and because a very large number 
of symbols must be classified (often each dot would be 
interpreted as a separate symbol). 

25 [0009] A system for compressing and decompressing 
binary representations of dithered images is disclosed. 
The currently preferred embodiment of the present in- 
vention provides a LOSSY method for compressing 
dithered images. In LOSSY compression, some of the 

30 original image data is lost. It has been determined that 
for dithered images such as halftoned and error-diffused 
Images an exact reproduction of the original image may 
not be necessary for acceptable results. The present in- 
vention incorporates the idea that maintaining the exact 

35 position of edges in a dithered image is not as important 
as maintaining the gray levels. This is because dithered 
Images contain dots which are Intentionally arranged in 
a pseudo random pattern. Through arrangement of pix- 
els in such a pseudo random pattern, undesirable arti- 

40 facts such as streaks or lines, are avoided. 

[001 0] The compression method of the present Inven- 
tion is generally comprised of the steps of defining a plu- 
rality of equivalence classes for tiles of multi-pixel binary 
encoded data contained in said binary encoded image, 

45 wherein tiles are of a first predetermined organization of 
binary data and each equivalence class has defined and 
associated therewith one or more rendering exemplars; 
classifying each tile in said binary encoded image into 
an equivalence class and encoding the equivalence 

so classes by scanline Into sequences of literal elements 
and copy elements, wherein literal elements direct de- 
compression to find the equivalence class in the com- 
pressed data stream and copy elements direct decom- 
pressk^n to find the equivalence class in the immediately 

55 preceding decompressed scanline. On decompression, 
the sequence of literal elements and copy elements are 
decoded into their respective equivalence classes a 
scanline at a time and then a corresponding rendering 
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exemplar is selected for each equivalence class. 

Figure 1 is an Illustration of a dithered image which 
may be compressed using the LOSSY compression 
method of the currently preferred embodiment of 
the present invention; 

Figure 2 Is a flowchart describing the general steps 
for the data compression and decompression meth- 
od of the currently preferred embodiment of the 
present invention; 

Figure 3 is a table containing the values of the en- 
coded equivalence classes as may be utilized in the 
currently preferred embodiment of the present in- 
vention; 

Figure 4 is a flowchart Illustrating the steps for com- 
pressing a scan line using the data compression 
method of the currently preferred embodiment of 
the present invention; 

Figure 5 is a table used for identifying which equiv- 
alence class a tile belongs to in the currently pre- 
ferred embodiment of the present invention; 
Figure 6 is a block diagram showing the functional 
components of a compression system for practicing 
the compression method of Figure 4; 
Figure 7 is a block diagram Illustrating a com- 
pressed scaniine data stream as may be utilized in 
the currently preferred embodiment of the present 
invention; 

Figure 8 is a flowchart Illustrating the steps for de- 
compressing a compressed data stream using the 
decompression method of the currently preferred 
embodiment of the present Invention; 
Figure 9 is a block diagram showing the functional 
components ot a decompression system tor prac- 
ticing the compression method of Figure 8; 
Figure 1 0 is a table containing the values o\ the ren- 
dering exemplars of the currently preferred embod- 
iment of the present invention; and, 
Figure 11 is an illustration of a computer based sys- 
tem upon which the currently preferred embodiment 
of the present invention may be utilized. 

[001 1 ] A system for compressing and decompressing 
binary representations of continuous tone images is dis- 
closed. The present invention may be used in various 
applications requiring or benefiting from data compres- 
sion. Such applications may be found as part of an over- 
all image processing system or as stand-alone applica- 
tions. The currently preferred embodiment of the 
present invention Is implemented as software running 
on a computer based system. The software is written in 
the C programming language. The present Invention 
has been preferably implemented for compression of 
pictorial image data. 

[001 2] The following terms take on the accompanying 
meaning in this description: 

[0013] Image refers to the markings on or appear- 
ance of a medium. 



[001 4] Image data refers to a representation of an im- 
age which may be used tor recreating the image. 
[0015] Pictorial innage refers to non-textual and non 
line art markings on a medium. 

5 [0016] Tile or Tiles refers to a logical organization of 
pixels as the primitive object which is classified. 
[0017] An equivalence class is a set of tiles found In 
an image that can be substituted for one another without 
changing the appearance of an image in an objectiona- 

10 ble way. 

[001 8] The rendering exemplars of the equivalence 
class are the set of pixel configurations for the equiva- 
lence class, one or more of which will be substituted for 
a member of the equivalence class when the image Is 
IS decompressed or otherwise recreated. The collection of 
rendering exemplars lor the equivalence classes are re- 
ferred to as a rendering dictionary. 
[0019] Copy element refers to a data element in a 
compressed data stream which instructs decompres- 
20 sion to obtain equivalence class values from the preced- 
ing decoded scaniine. 

[0020] Literal element refers to a data element in a 

compressed data stream which instructs decompres- 
sion to obtain equivalence class values In the element 
itself. 

[0021] Compressed Data Stream refers to a com- 
pressed representation of an Image comprised of copy 
and literal elements and possibly a corresponding ren- 
dering dictionary. 

[0022] The currently preferred embodiment of the 
present invention provides a LOSSY method for com- 
pressing dithered images. In LOSSY compression, a 
certain amount of the data Is altered during compression 
and subsequent decompression. It has been deter- 
mined that for both hatftoned and error-diffused Images 
an exact reproduction of a original Image is not neces- 
sary for acceptable results. The present invention Incor- 
porates the Idea that maintaining the exact position of 
edges in a dithered image is not as important as main- 
taining the gray levels. This is because dithered Images 
contain dots which are Intentionally arranged in a pseu- 
do random pattem. Through arrangement in such a 
pseudo random pattern, undesirable artifacts such as 
streaks or lines, are avoided. 

[0023] A dithered pictorial image which may be com- 
pressed using the method of present Invention is Illus- 
trated in Figure 1 . Referring to Figure 1 , a document im- 
age 1 00 has both text and a pictorial areas. The pictorial 
area 101 is Indicated and shown In a magnified view 
102. As described above, the magnified view 1 02 shows 
the pictorial area 101 being comprised of dots In a ran- 
dom pattern. 

[0024] It should be noted that the text area may also 
be compressed using the method of the present inven- 
tion, but the resulting decompressed image may lose 
too much information (i.e. it will look bad). Various tech- 
niques are known to separate text and pictorial areas 
from a scanned Image and applying different compres- 
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sion schemes on each. The method of the present in- 
vention may operate in such an implementation. 
[0025] The present invention is accomplished by first 
defining a set ot equivalence classes for a logical unit, 
i.e. a tile, ot the image according to a set ot predeter- 
mined rules. When compressing, each tile in the image 
is assigned to an equivalence class. Each tile in the 
equivalence class will have the same gray level, but may 
have different sequences of black and white pixels. 
[0026] Further associated with each equivalence 
class is a set o1 rendering exemplars. Each rendering 
exemplar will produce the same gray level (i.e. they 
have the same number of black pixels). Upon decom- 
pression, the set of equivalence classes representing 
the image are decoded. For each equivalence class a 
rendering exemplar is selected from the corresponding 
set based on some pseudo random criteria (e.g. the 
scanline). 

[0027] Another way to characterize the present inven- 
tion is view it as vector quantization with a reduced size 
rendering dictionary and wherein the pictorial image is 
redithered on decompression. 

[0028] Figure 2 Is a flowchart which describes the 
general steps of the method of the present invention. 
First, a set of equivalence classes and corresponding 
rendering dictionary are created for a particular tile con- 
figuration, step 201 . In order for a document image to 
be compressed, a document is scanned to create image 
data, step 202. The image data is typically a bit-mapped 
representation of the image. The pictorial portions of the 
image data are segmented from the text portions of the 
image data, step 203. It is the pictorial portion that is 
processed by the present invention. It should be noted 
that preferably any text contained in the pictorial portion 
also be segmented out. Further, the entire image may 
be comprised of pictorial data. The pictorial portion of 
the image data is then converted into a binary represen- 
tation to create a representation where each pixel is rep- 
resented by a single bit, step 204. A multi-level image 
may typically have a pixel represented by a multi-bit data 
operand. In order to convert it into a single bit value, a 
dithering operation is typically performed. This dithering 
operation provides a smooth and visually pleasing tran- 
sition between different areas of the image. 
[0029] Each of the tiles in the binary representation 
are then extracted and categorized into a particular 
equivalence class, step 205. An equivalence class iden- 
tifier is provided for each tile. In the currently preferred 
embodiment, a tile is a 1 pixel high and 8 pixel wide 
block. It is assumed that the equivalence classes are 
predefined in a manner which corresponds to the tile 
size. The stream of equivalence class identifiers are 
then encoded, step 206. The steps 205 and 206 as de- 
scribed herein are in general terms. The preferred man- 
ner in which steps 205 and 206 are implemented is de- 
scribed in greater detail below. However, it should be 
noted that other known techniques may be utilized and 
whk:h would not cause departure from the spirit and 



scope of the present invention. In any event, the encod- 
ing step 206 results in a compressed image data stream. 
The compressed data stream is comprised of a se- 
quence of copy elements (for instructing a decompres- 

5 sion process to find each equivalence class from the 
corresponding position(s) in the preceding decoded 
scanline) and literal elements (for instructing the decom- 
pression process to derive each equivalence class di- 
rectly from the compressed data stream). Each copy 

10 and literal element will represent some number of tiles 
extracted from the image data. Depending on the imple- 
mentation of the present invention, the compressed im- 
age data stream may or may not include the rendering 
dictionary. 

IS [0030] It should be noted that in the currently pre- 
ferred embodiment the encoding of the equivalence 
class representations is made immediately after the tiles 
of a scanline have been converted into equivalence 
classes. This may in fact minimize the amount of internal 

20 memory required for performing the compression (e.g. 
by limiting it to two scanline buffers). However, it would 
be possible to perform the encoding after all the equiv- 
alence classes have been identified for the image, which 
may enable the use of alternative encoding schemes. 

25 [0031] The compressed data stream may then be 
stored or transmitted, depending on the purpose for the 
compressed data stream, step 207. 
[0032] When the compressed data stream is to be de- 
compressed, the equivalence class identifier encoding 

30 is decoded in order to obtain the equivalence class rep- 
resentatk^n. step 208. A binary representatbn of the im- 
age is created using the equivalence class representa- 
tions, the rendering dictionary and some pseudo ran- 
dom input such as the number of the current scanline, 

35 step 209. Note that this binary representation would typ- 
ically be different from the originally created binary rep- 
resentation in step 204 since for any particular tile, the 
specific rendering exemplar used may have a different 
pixel configuration of the original tile. This in effect caus- 

40 es the image to be redithered. 

[0033] In the currently preferred embodiment, the im- 
age is broken into tiles 1 pixel tall and 8 pixels wide. 
Utilization of other tile sizes and dimensions are possi- 
ble and would not depart from the spirit and scope of the 

45 present invention. The contents of each tile is then clas- 
sified into one of a plurality of predefined equivalence 
classes. The equivalence classes are defined so that 
upon decompression, re-dithering may occur 
[0034] In order to achieve significant compression, it 

50 is desirable to minimize the number of equivalence 
classes. For a tile size of 1 XSbinary values, there could 
theoretically be a maximum of 256 equivalence classes. 
An efficient method of defining a minimal number of 
equivalence classes is needed, fn the currently pre- 

55 ferred embodiment there are 47 predefined equivalence 
classes. This number was experimentally determined to 
provide acceptable visual results on decompression. 
However utilization of a different number of equivalence 
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classes is possible and would clearly be within the spirit 
and scope of the present invention. 
[0035] The general rules used for creating the equiv- 
alence classes are as follows: 

1 . All members of an equivalence class should have 
the same number of black pixels. 

2. An isolated black pixel can move anywhere inside 
a tile and still be in the same equivalence class. 

3. An isolated block of 2 black pixels can move left 
or right two pixel positions within a tile while remain- 
ing in the same equivalence class. 

4. An isolated block of 3 or 4 black pixels can move 
left or right by 1 pixel position within a tile while re- 
maining in the same equivalence class. 

5. A block of 2 or more black pixels attached to the 
left or right edge of a tile must stay attached to that 
edge. 

6. If there are more black pixels than white pixels in 
a tile, all the above rules are applied to groups of 
isolated white pixels instead of applying them to 
groups of isolated black pixels. For example, this 
means that the equivalence classes that contain 5 

. black pixels can be derived from the equivalence 
classes thai contain 3 black pixels by simply invert- 
ing ail the pixels in each tile of each class. 

[0036] The equivalence classes of the currently pre- 
ferred embodiment are illustrated in the table of Figure 
3. In reviewing the table of Figure 3 it should be noted 
that the tile configurations are illustrated in their hex 
rather than the binary values of the actual tile. The hex/ 
binary equivalents are: 0=0000, 1=0001, 2=0010, 
3=0010, 4=0100, 5=0101, 6=0110. 7=0111, 8=1000. 
9=1001, A=1010, B=1011, C=1100, D=1101. E=1110 
and F=1111. It should be noted that in Figure 3, the 
classes 0-7 may appear to be out of order. It has been 
determined that the ordering of classes in this manner 
may possibly lead to greater compression since there 
are many transitions amongst this group of classes and 
special coding may be utilized. However, utilization of 
different ordering of the classes would be within the spirit 
and scope of the present invention. 
[0037] In the currently preferred embodiment, images 
are converted to their equivalence class representation 
and then encoded a scanline at a time. Figure 4 is a 
flowchart illustrating the steps for compressing a scan- 
line in the currently preferred embodiment of the present 
invention and correspond to steps 205-206 of Figure 2. 
Referring to Figure 4, a tile in a first scanline is extracted 
from the image, step 401 . This extraction is merely tak- 
ing a grouping of 8 pixels, or as they are represented by 
binary data, a byte of data in the scanline. The equiva- 
lence class in which the tile belongs Is identified, step 
402. The currently preferred embodiment uses the table 
in Figure 5 to map a tile to its equivalence class. Refer- 
ring to Figure 5, the column entry represents the 4 left 
most bits and the row entry represents the 4 right most 



bits of a tile. Equivalence class identification is then 
merely a table-look up. 

[0038] The equivalence class information is then 
stored in a first scanline buffer, step 403. As will become 

5 apparent in the description below, the first scanline buff- 
er will become a "reference" buffer for encoding the sec- 
ond scanline. The steps 401-403 are repeated until all 
the tiles in the first scanline are converted into their re- 
spective equivalence classes, step 404. The equiva- 

^0 lence classes of the first scanline are then encoded into 
a literal element, step 405. In this case it will be an en- 
coding where the length is the length of the scanline fol- 
lowed by an encoding by equivalence class identifiers. 
It should also be noted that for the first scanline, the first 

^5 element is a copy element having a zero length. Further, 
in the cu rrently preferred embodiment a H uffman encod- 
ing of literal elements and lengths is performed. The 
Huffman codes utilized are based on experimentation 
and observation of the frequency patterns of equiva- 

20 lence classes. Utilization of Huffman codes in this man- 
ner is known in the art. 

[0039] A second scanline is then converted into a set 
of equivalence classes using basically the same steps 
as described in steps 401 -403 except that the equiva- 

2S lence classes are stored in a second scanline buffer, 
step 406. Encoding of the equivalence classes hence- 
forth occurs differently since there is now a previous en- 
coded scanline to compare to. It should be noted that 
encoding for the subsequent scanltnes always results 

30 in a sequence of alternating copy and literal elements. 
First a tile's equivalence class of the second scanline 
buffer is compared to the corresponding positk>n in the 
first scanline buffer, step 407. If there is a match, the 
successive equivalence class IDs are compared from 

35 the respective buffers and a match length calculated un- 
til there is no longer a match, step 408. A copy elements 
match length is then encoded, step 409. Copy elements 
are also Huffman coded. 

[0040] If there was no match in step 407, a zero match 
40 length is encoded in the copy element, step 410. Next, 
the successive equivalence class definitions in the sec- 
ond scanline buffer are compared to determine if they 
are in the same equivalence class and a length calcu- 
lated until a matching pair of equivalence classes is en- 
45 countered, step 4 1 1 . A length for a literal element is then 
encoded along with the equivalence class identifiers 
represented by the literal element, step 412. The alter- 
nating creation of copy elements and literal elements is 
then repeated for the remainder of the scanline, step 
so 413. 

[0041] For the next (e.g. third) scanline, the second 
scanline buffer is used as the "reference" buffer, while 
the first scanline buffer is used as the buffer into which 
equivalence classes are written. This switching of the 
55 use of the first and second scanline buffers as the "ref- 
erence" buffer continues for the entire image. 
[0042] It is noted that this technique of using a previ- 
ously encoded scanline as a reference buffer is a well 
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known technique used for example in CCITT Group 4 
compression. However, other types of encodings for a 
stream of equivalence class identifiers could be utilized 
such as a lempel-ziv technique, without departing from 
the spirit and scope of the present invention. Figure 6 is s 
a block diagram illustrating the functional components 
for performing the compression method of Figure 4. Re- 
ferring to Figure 6. an Uncompressed Image Data 
Stream 601 is coupled to an Equivalence class identifier 
602. The Uncompressed Image Data Stream 601 con- 
tains the original uncompressed scanline data. Further 
coupled to the equivalence class identifier 602 is an 
equivalence class map 603. The equivalence class map 
603 may be stored in a conventional storage device and 
will contain the information as shown in Figure 5. The 
output of the equivalence class identifier 602 switches 
between scanline buffers 604 and 605. As noted above, 
since encoding of the equivalence classes depends on 
the equivalence classes of the previous scanline. a copy 
of the previous scanline is maintained. Each of the scan- 
line buffers 604 and 605 is further coupled to equiva- 
lence class encoder 606. The equivalence class encod- 
er 606 compares the contents of the scanline buffers 
and encodes them into copy or literal elements as ap- 
propriate which are stored as a Compressed Data 
Stream 607. Further coupled to the equivalence class 
encoder is encoding tables 608. The encoding tables 
608 contain the information needed to perform the Huff- 
man encoding on the copy and literal elements. 
[0043] Figure 7 is a block diagram illustrating a com- 
pressed scanline data stream. Referring to Figure 7, as 
noted above for a compressed scanline the first element 
is always a copy element, here copy element 701. The 
infomnatlon contained in the copy element 701 is a 
match length. The next element, if one exists, will be a 
literal element 702. Note that no literal element would 
be encoded if the equivalence class identifiers for the 
current scanline and the reference scanline were iden- 
tical. The literal element contains a length along with a 
corresponding number of equivalence class identifiers 

703. If there is a next element it will be a copy element 

704, again comprising a match length as the primary in- 
formation. If the scanline is not completed, this will again 
be followed by a literal element 705 and equivalence 
class identifiers 706. In practice, a copy element is 
merely an encoding of the match count while a literal 
element is an encoding of a length followed by the cor- 
responding number of equivalence class encodings. 
[0044] As noted above compression is further 
achieved through a Huffman encoding of the length in- 
formation for the literal and copy elements of the created 
compressed data stream. It should be noted that the 
codes for the lengths of literal and copy elements are 
different. This is because the observed frequency char- 
acteristics for the two types of elements differ. 
[004S] Further, the currently preferred embodiment 
contains other special encodings for the lengths for copy 
elements. A code is provided for indicating copying to 



the end of scanline. Other codes are provided to indicate 
copying to a next (or next after the next), key point. A 
key point is defined as a point in the previously encoded 
scanline where a transition was made from encoding of 
a copy element to encoding of a literal element. Note 
that this type of encoding is also done in CCITT Group 
4 encoding, so no further description of this type of en- 
coding is deemed necessary. 

[0046] As described above, each equivalence class 
is represented by a plurality of rendering exemplars. In 
the currently preferred embodiment, each equivalence 
class is represented by a set of 8 rendering exemplars. 
Note that in some instances the 8 rendering exemplars 
may not all be unique (e.g. the case where an equiva- 
lence class only represents only a single tile configura- 
tion). However, for the most part, the 8 rendering exem- 
plars were chosen because they present a pleasing vis- 
ual appearance when used to re-dither the equivalence 
class on decoding. 

[0047] As described above, in the currently preferred 
embodiment, scanline encpding/compression is per- 
formed based on the content of the immediately previ- 
ous scanline. A similar process is performed in decod- 
ing/decompression. Figure 8 is a flowchart describing 
the steps for decompression. Referring to Figure 8. a 
first scanline is decoded to create a decoded scanline 
in a current scanline buffer, step 801. The decoded 
scanline is comprised of a scanline of equivalence class 
identifiers. In the currently preferred embodiment, infor- 
mation indicating the length of a scanline is obtained 
from descriptive information which describes the image. 
For example, a document image encoded in the Tagged 
Information File Format (TIFF) will include a tag having 
image width information from which scanline intorma- 
tion may be derived. So the decoding occurs until the 
determined number of equivalence class identifiers 
have been decoded. Rendering exemplars for each of 
the equivalence classes are then identified and output 
as a decompressed scanline, step 802. In any event, 
once a scanline is decoded and equivalence class ex- 
emplars identified, the "current" scanline buffer be- 
comes a "reference scanline buffer" and the next scan- 
line becomes the current scanline and is decoded based 
in part on the decoded equivalence class identifiers 
found in the reference scanline buffer. The process be- 
gins by first obtaining a copy element for the current 
scanline and switching the current scanline buffer to a 
reference scanline buffer, step 803. As described 
above, in the currently preferred embodiment, this first 
element will always be a copy element, or more precise- 
ly it is presumed to be a count (which may be zero). The 
copy element is decoded to obtain a length M, step 804. 
It should be noted that as described above, the length 
M may represent a key point in the reference scanline. 
In any event, the number of equivalence class IDs rep- 
resented by Length M in the corresponding positions in 
the reference scanline buffer are then copied to the cur- 
rent scanline buffer, step 805. Note that if length M is 
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zero, then no equivalence class IDs are copied. 
[0048] The next literal element, if one exists, is decod- 
ed to determine a length N, step 806. The N equivalence 
class IDs included in the literal element are then copied 
into the current scanline buffer, step 807. The decoding 
of the current scanline continues per steps 803-807 con- 
tinues until enough equivalence class IDs have been de- 
coded to fill the scanline, step 808. When the decoding 
of the current scanline is completed, the rendering ex- 
emplars for the various equivalence classes in the cur- 
rent scanline are obtained and output as a decoded 
scanline, step 809. 

[0049] The steps 803-809 are then repeated for the 
remainder of the scanlines in the image, step 810. 
[0050] Figure 9 is a block diagram of the functional 
components of a decompression system for performing 
the decompression described in the flowchart of Figure 
8. Compressed Data stream 901 is input to an element 
decoder unit 902. The element decoder unit 902 is com- 
prised of a copy element decoder 903 and a literal ele- 
ment decoder 904. The copy element decoder 903 will 
decode the match length information for copy elements. 
The literal element decoder 904 will decode the count 
information as well the corresponding number of equiv- 
alence class I Ds. Each of the copy element decoder 903 
and the literal element decoder 904 is coupled to a buffer 
unit 905. The buffer unit 905 is comprised of scanline 
buffer 1 906 and scanline buffer 2 907. The buffer unit 

905 further contains buffer control 908 which manages 
access to the buffers (e.g. determines which is the ref- 
erence buffer used by copy element decoder, when the 
buffer can be output, etc..) 

[0051] The copy element decoder 903 will send con- 
trol information to the buffer unit 905 for controlling cop- 
ying of equivalence class IDs between scanline buffer 1 

906 and scanline buffer 2 907. The literal element 904 
will send decoded equivalence class IDs to the buffer 
unit 905 for storing in one of scanline buffer 1 906 or 
scanline buffer 2 907. 

[0052] Further coupled to the buffer unit 905 to re- 
ceive scanlines of decoded equivalence class IDs is 
dither unit 909. The dither unit 909 is used for determin- 
ing the rendering exemplar to be used for rendering the 
corresporTding equivalence class. Coupled to the dither 
unit 909 is a rendering dictionary 910. The rendering dic- 
tionary contains the sets of rendering exemplars used 
for each equivalence class. The output of the dither unit 
909 is the decompressed image data stream 91 1 . 
[0053] Figure 10 is a table illustrating the rendering 
exemplars for the equivalence classes of the currently 
preferred embodiment of the present invention. The ta- 
ble of Figure 10 may also be considered the rendering 
dictionary for the equivalence classes and would be 
contained in the rendering dictionary 910 of Figure 9. 
The rendering dictionary enables the decompressed 
pictorial image to remain dithered (or to be re-dithered). 
The creation of the rendering exemplars is based in part 
on the number of tile configurations and an estimate of 



the aesthetic effect obtained by filling a region with a 
single equivalence class so as to create no streaks or 
bars. Note that other sets of rendering exemplars may 
be used, so long as an acceptable aesthetic effect is 
s maintained, without departing from the spirit and scope 
of the present invention. 

[0054] In the currently preferred embodiment, the ren- 
dering exemplar used is determined by the selecting the 
exemplar corresponding to the value of the scanline 

10 number being processed modulo the number of exem- 
plars, e.g. 8. So for example, for the 25th scanline, 25 
modulo 8, i.e. 1, would cause the second exemplar in 
the corresponding set to be used. Note that the sets are 
numbered 0-7. It should be noted that other methods 

15 could be used for selecting the exemplar (e g. a com- 
pletely random scheme), and would not cause depar- 
ture from the spirit and scope of the present invention. 
[0055] The computer based system on which the cur- 
rently preferred embodiment of the present inventbn 

20 may be used is described with reference to Figure 11 . 
Referring to Figure 11, the computer based system is 
comprised of a plurality of components coupled via a 
bus 1101. The bus 1101 illustrated here is simplified in 
order not to obscure the present invention. The bus 1101 

25 may consist of a plurality of parallel buses (e.g. address, 
data and status buses) as well as a hierarchy of buses 
(e.g. a processor bus, a local bus and an I/O bus). In 
any event, the computer system is further comprised of 
a processor 1 1 02 for executing instructions provided via 

30 bus 1101 from Internal memory 1103 (note that the In- 
ternal memory 1103 is typically a combination of Ran- 
dom Accessor Read Only Memories). Such instructions 
are those that are preferably implemented in software 
for carrying out the processing steps outlined above in 

35 the flowcharts of Figures 2. 4 and 8. The processor 1102 
and Internal memory 1 1 03 may be discrete components 
or a single integrated device such as an Application 
Specification Integrated Circuit (ASIC) chip. Further the 
combination of processor 1102 and Internal Memory 

40 1 103 connprise circuitry for performing the functionality 
of the present inventbn so that the currently preferred 
embodiment of the present invention coukJ be imple- 
mented on a single ASIC or other integrated circuit chip. 
[0056] Also coupled to the bus 1101 are a keyboard 

45 1104 lor entering alphanumeric input, external storage 
1 1 05 for storing data such as a compressed text image 
data file, a cursor control device 1106 for manipulating 
a cursor, and a display 1 1 07 for displaying visual output. 
The keyboard 1104 would typically be a standard QW- 

50 ERTY keyboard but may also be a telephone like key- 
pad. The external storage 1105 may be fixed or remov- 
able magnetic or optical disk drive. The cursor control 
device 1106 will typically have a button or switch asso- 
ciated with it to which the performance of certain f unc- 

55 tions can be programmed. Further coupled to the bus 
1101 is a scanner 1108. The scanner 1108 provides a 
means for creating a bitmapped representation of a me- 
dium (i.e. a scanned document image). 
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[0057] Further elements that could typically be cou- 
pled to the bus 1101 would include printer 1109, facsim- 
ile element 1110 and network connection 1111. The 
printer 1109 could be used to print the bitmapped rep- 
resentation. The facsimile element 1110 may contain an 
element used to transmit a image data that has been 
compressed using the present invention. Altematively, 
the facsimile element 1110 could include an element for 
decompression of a document image compressed using 
the present invention. The network connecton 1111 
would be used to receive and/or transmit data contain- 
ing image data. Thus, the image data utilized by the 
present invention may be obtained through a scanning 
process, via a received fax or over a network. 



Claims 

1 . A method for processing a binary representation of 
a dithered image so that it may be compressed with- 
out losing essential information, said method com- 
prising the steps of: 

a) defining a plurality of equivalence classes for 
tiles of multi-pixel binary data contained in said 
dithered image, wherein tiles are of a predeter- 
mined organization of binary data, each of said 
equivalence classes further having defined and 
associated therewith a plurality of rendering ex- 
emplars having the same gray level; 

b) identifying one equivalence class for each 
tile in said dithered image; 

c) encoding each of said equivalence classes 
into an alternating sequence of literal elements 
and copy elements, said literal elements con- 
taining equivalence class information for one or 
more consecutive tiles and said copy elements 
containing information for copying previously 
encoded equivalence class information. 

2. A method for decompressing a compressed binary 
representation of a dithered image, said dithered 
image organized into a plurality of tiles representing 
multiple pixels, said method comprising the steps 
of: 
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A system for compression and decompression of 
dithered images, said system comprising: 

Input means for receiving a dithered image; 
a processor for performing operations for com- 
pressing said dithered image and decompress- 
ing a compressed representation of said dith- 
ered image; 

storage means for storing tile equivalence class 
definitions, each of said tile equivalence class 
definitions for categorizing a plurality of tile con- 
figurations into a single class; 
said storage means further for storing data in- 
cluding operations for compressing said dith- 
ered image, said operations including: 
operations for creating a binary representation 
of said dithered image; 

operations for extracting tiles from said binary 
representation and determining the equiva- 
lence class for a tile; 

operations for compressing a predetermined 
collection of tiles based on their determined 
equivalence class creating an alternating se- 
quence of literal elements and copy elements; 
said storage means further for storing equiva- 
lence class exemplars, each of said equiva- 
lence class exemplars comprising a plurality of 
representative tile configurations for said class, 
each of said tile configurations have the same 
gray level; 

said storage means further for storing data in- 
cluding operations for decompressing a com- 
pressed representation erf a dithered image in- 
cluding: 

operations for decompressing sequences of lit- 
eral and copy elements into instances of equiv- 
alence classes; 

operations for selecting one of said plurality of 
representative tile configurations for an in- 
stance of an equivalence class. 



a) decoding said compressed binary represen- 
tation into a plurality of literal elements and 
copy elements; 

b) decoding each of said literal elements and so 
copy elements into equivalence class identifi- 
ers, each equivalence class identifier corre- 
sponding to a tile of said dithered image; 

c) selecting an equivalence class exemplar 
from a rendering dictionary for each equiva- 55 
lence class identifier, said rendering dictionary 
comprised of a plurality of representative tile 
configurations for said class. 
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